Goto

Collaborating Authors

 household task


6 Scary Predictions for AI in 2026

WIRED

Could the AI industry be on the verge of its first major layoffs? Will China spread propaganda to slow the US data-center building boom? Where are AI agents headed? AI-powered robots are just one of the topics likely to grab headlines in 2026. When OpenAI declared a "code red" this month to refocus its teams on competing with Google, I couldn't help but think back to December three years ago when the companies' roles were reversed.


VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation

Zhou, Hui, Huang, Siyuan, Li, Minxing, Zhang, Hao, Fan, Lue, Shi, Shaoshuai

arXiv.org Artificial Intelligence

Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approaches, a majority of current VLA systems employ parallel two finger grippers as their default end effectors. However, such grippers face inherent limitations in handling certain real world tasks such as wiping glass surfaces or opening drawers without handles due to insufficient contact area or lack of adhesion. To overcome these challenges, we present a low cost, integrated hardware design that combines a mechanical two finger gripper with a vacuum suction unit, enabling dual mode manipulation within a single end effector. Our system supports flexible switching or synergistic use of both modalities, expanding the range of feasible tasks. We validate the efficiency and practicality of our design within two state of the art VLA frameworks: DexVLA and Pi0. Experimental results demonstrate that with the proposed hybrid end effector, robots can successfully perform multiple complex tasks that are infeasible for conventional two finger grippers alone. All hardware designs and controlling systems will be released.


AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation

Takanami, Ryosuke, Khrapchenkov, Petr, Morikuni, Shu, Arima, Jumpei, Takaba, Yuta, Maeda, Shunsuke, Okubo, Takuya, Sano, Genki, Sekioka, Satoshi, Kadoya, Aoi, Kambara, Motonari, Nishiura, Naoya, Suzuki, Haruto, Yoshimoto, Takanori, Sakamoto, Koya, Ono, Shinnosuke, Yang, Hu, Yashima, Daichi, Horo, Aoi, Motoda, Tomohiro, Chiyoma, Kensuke, Ito, Hiroshi, Fukuda, Koki, Goto, Akihito, Morinaga, Kazumi, Ikeda, Yuya, Kawada, Riko, Yoshikawa, Masaki, Kosuge, Norio, Noguchi, Yuki, Ota, Kei, Matsushima, Tatsuya, Iwasawa, Yusuke, Matsuo, Yutaka, Ogata, Tetsuya

arXiv.org Artificial Intelligence

As robots transition from controlled settings to unstructured human environments, building generalist agents that can reliably follow natural language instructions remains a central challenge. Progress in robust mobile manipulation requires large-scale multimodal datasets that capture contact-rich and long-horizon tasks, yet existing resources lack synchronized force-torque sensing, hierarchical annotations, and explicit failure cases. We address this gap with the AIRoA MoMa Dataset, a large-scale real-world multimodal dataset for mobile manipulation. It includes synchronized RGB images, joint states, six-axis wrist force-torque signals, and internal robot states, together with a novel two-layer annotation schema of sub-goals and primitive actions for hierarchical learning and error analysis. The initial dataset comprises 25,469 episodes (approx. 94 hours) collected with the Human Support Robot (HSR) and is fully standardized in the LeRobot v2.1 format. By uniquely integrating mobile manipulation, contact-rich interaction, and long-horizon structure, AIRoA MoMa provides a critical benchmark for advancing the next generation of Vision-Language-Action models. The first version of our dataset is now available at https://huggingface.co/datasets/airoa-org/airoa-moma .


BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

Jiang, Yunfan, Zhang, Ruohan, Wong, Josiah, Wang, Chen, Ze, Yanjie, Yin, Hang, Gokmen, Cem, Song, Shuran, Wu, Jiajun, Fei-Fei, Li

arXiv.org Artificial Intelligence

Real-world household tasks present significant challenges for mobile manipulation robots. An analysis of existing robotics benchmarks reveals that successful task performance hinges on three key whole-body control capabilities: bimanual coordination, stable and precise navigation, and extensive end-effector reachability. Achieving these capabilities requires careful hardware design, but the resulting system complexity further complicates visuomotor policy learning. To address these challenges, we introduce the BEHAVIOR Robot Suite (BRS), a comprehensive framework for whole-body manipulation in diverse household tasks. Built on a bimanual, wheeled robot with a 4-DoF torso, BRS integrates a cost-effective whole-body teleoperation interface for data collection and a novel algorithm for learning whole-body visuomotor policies. We evaluate BRS on five challenging household tasks that not only emphasize the three core capabilities but also introduce additional complexities, such as long-range navigation, interaction with articulated and deformable objects, and manipulation in confined spaces. We believe that BRS's integrated robotic embodiment, data collection interface, and learning framework mark a significant step toward enabling real-world whole-body manipulation for everyday household tasks. BRS is open-sourced at https://behavior-robot-suite.github.io/


From Vocal Instructions to Household Tasks: The Inria Tiago++ in the euROBIN Service Robots Coopetition

Amadio, Fabio, Donoso, Clemente, Totsila, Dionis, Lorenzo, Raphael, Rouxel, Quentin, Rochel, Olivier, Hoffman, Enrico Mingo, Mouret, Jean-Baptiste, Ivaldi, Serena

arXiv.org Artificial Intelligence

Abstract--This paper describes the Inria team's integrated robotics system used in the 1st euROBIN coopetition, during which service robots performed voice-activated household tasks in a kitchen setting. The key contributions (opens-sourced) are the integration of these components and the design of custom teleoperation devices, addressing practical challenges in the deployment of service robots. EuROBIN is a Network of the robot was requested to understand and execute standard Excellence in AI and Robotics, funded by the European instructions following two patterns: (1) pick an object from a Commission. Among its objectives, it promotes transfer of designated location and place it at another location; (2) pick robotics and AI software, methods and practices, by organizing an object from a designated location and deliver it to a person. Twenty teams participated to the to extend the use of the platforms to new or unexpected 1st coopetition, in three different leagues. The Inria team situations and cope with failures that might occur during participated to the Service Robots League, including six teams, autonomous operation.


Robot Behavior Personalization from Sparse User Feedback

Patel, Maithili, Chernova, Sonia

arXiv.org Artificial Intelligence

As service robots become more general-purpose, they will need to adapt to their users' preferences over a large set of all possible tasks that they can perform. This includes preferences regarding which actions the users prefer to delegate to robots as opposed to doing themselves. Existing personalization approaches require task-specific data for each user. To handle diversity across all household tasks and users, and nuances in user preferences across tasks, we propose to learn a task adaptation function independently, which can be used in tandem with any universal robot policy to customize robot behavior. We create Task Adaptation using Abstract Concepts (TAACo) framework. TAACo can learn to predict the user's preferred manner of assistance with any given task, by mediating reasoning through a representation composed of abstract concepts built based on user feedback. TAACo can generalize to an open set of household tasks from small amount of user feedback and explain its inferences through intuitive concepts. We evaluate our model on a dataset we collected of 5 people's preferences, and show that TAACo outperforms GPT-4 by 16% and a rule-based system by 54%, on prediction accuracy, with 40 samples of user feedback.


Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

Xiao, Anxing, Janaka, Nuwan, Hu, Tianrun, Gupta, Anshul, Li, Kaixin, Yu, Cunjun, Hsu, David

arXiv.org Artificial Intelligence

In this paper, we introduce Robi Butler, a novel household robotic system that enables multimodal interactions with remote users. Building on the advanced communication interfaces, Robi Butler allows users to monitor the robot's status, send text or voice instructions, and select target objects by hand pointing. At the core of our system is a high-level behavior module, powered by Large Language Models (LLMs), that interprets multimodal instructions to generate action plans. These plans are composed of a set of open vocabulary primitives supported by Vision Language Models (VLMs) that handle both text and pointing queries. The integration of the above components allows Robi Butler to ground remote multimodal instructions in the real-world home environment in a zero-shot manner. We demonstrate the effectiveness and efficiency of this system using a variety of daily household tasks that involve remote users giving multimodal instructions. Additionally, we conducted a user study to analyze how multimodal interactions affect efficiency and user experience during remote human-robot interaction and discuss the potential improvements.


LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning

Li, Haoming, Chen, Zhaoliang, Zhang, Jonathan, Liu, Fei

arXiv.org Artificial Intelligence

Effective planning is essential for the success of any task, from organizing a vacation to routing autonomous vehicles and developing corporate strategies. It involves setting goals, formulating plans, and allocating resources to achieve them. LLMs are particularly well-suited for automated planning due to their strong capabilities in commonsense reasoning. They can deduce a sequence of actions needed to achieve a goal from a given state and identify an effective course of action. However, it is frequently observed that plans generated through direct prompting often fail upon execution. Our survey aims to highlight the existing challenges in planning with language models, focusing on key areas such as embodied environments, optimal scheduling, competitive and cooperative games, task decomposition, reasoning, and planning. Through this study, we explore how LLMs transform AI planning and provide unique insights into the future of LM-assisted planning.


ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments

Kim, Taewoong, Min, Cheolhong, Kim, Byeonghwi, Kim, Jinyeon, Jeung, Wonje, Choi, Jonghyun

arXiv.org Artificial Intelligence

Simulated virtual environments have been widely used to learn robotic agents that perform daily household tasks. These environments encourage research progress by far, but often provide limited object interactability, visual appearance different from real-world environments, or relatively smaller environment sizes. This prevents the learned models in the virtual scenes from being readily deployable. To bridge the gap between these learning environments and deploying (i.e., real) environments, we propose the ReALFRED benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks by understanding free-form language instructions and interacting with objects in large, multi-room and 3D-captured scenes. Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps. With ReALFRED, we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics, encouraging the community to develop methods in more realistic environments. Our code and data are publicly available.


This new system can teach a robot a simple household task within 20 minutes

MIT Technology Review

While other types of AI, such as large language models, are trained on huge repositories of data scraped from the internet, the same can't be done with robots, because the data needs to be physically collected. This makes it a lot harder to build and scale training databases. Similarly, while it's relatively easy to train robots to execute tasks inside a laboratory, these conditions don't necessarily translate to the messy unpredictability of a real home. To combat these problems, the team came up with a simple, easily replicable way to collect the data needed to train Dobb-E--using an iPhone attached to a reacher-grabber stick, the kind typically used to pick up trash. Then they set the iPhone to record videos of what was happening.